Comparison of Early and Late Omics Data Integration for Cancer Modules Gene Ranking
نویسندگان
چکیده
In a recent work we evaluated the ability of semi-supervised learning methods based on random walks to rank genes with respect to Cancer Modules (CM) using networks constructed from different sources of information (Re and Valentini, 2012). The performance of this approach was evaluated using a relatively simple data integration scheme consisting in the unweighted sum of the adjacency matrices of the biomolecular networks involved in our experiments. Despite the achievement of good performances, our tests were all based on a network integration approach applied before the gene prioritization phase (early data integration). Recently published works demonstrated that good results can also be obtained by performing the integration step after the production of a prioritization ranking for each available dataset (late data integration), through the integration the ranking vectors (Kolde et al., 2012). The aim of this contribution is to compare prioritization performances on CM genes using early and late data integration methods in order to highlight benefits and potential pitfalls characterizing these approaches when applied in large scale gene prioritization problems.
منابع مشابه
Exploring Gene Signatures in Different Molecular Subtypes of Gastric Cancer (MSS/ TP53+, MSS/TP53-): A Network-based and Machine Learning Approach
Gastric cancer (GC) is one of the leading causes of cancer mortality, worldwide. Molecular understanding of GC’s different subtypes is still dismal and it is necessary to develop new subtype-specific diagnostic and therapeutic approaches. Therefore developing comprehensive research in this area is demanding to have a deeper insight into molecular processes, underlying these subtypes. In this st...
متن کاملIdentification of ovarian cancer driver genes by using module network integration of multi-omics data.
The increasing availability of multi-omics cancer datasets has created a new opportunity for data integration that promises a more comprehensive understanding of cancer. The challenge is to develop mathematical methods that allow the integration and extraction of knowledge from large datasets such as The Cancer Genome Atlas (TCGA). This has led to the development of a variety of omics profiles ...
متن کاملIdentification of Prognostic Genes in Her2-enriched Breast Cancer by Gene Co-Expression Net-work Analysis
Introduction: HER2-enriched subtype of breast cancer has a worse prognosis than luminal subtypes. Recently, the discovery of targeted therapies in other groups of breast cancer has increased patient survival. The aim of this study was to identify genes that affect the overall survival of this group of patients based on a systems biology approach. Methods: Gene expression data and clinical infor...
متن کاملSFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy
In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....
متن کاملModule Analysis Captures Pancancer Genetically and Epigenetically Deregulated Cancer Driver Genes for Smoking and Antiviral Response
The availability of increasing volumes of multi-omics profiles across many cancers promises to improve our understanding of the regulatory mechanisms underlying cancer. The main challenge is to integrate these multiple levels of omics profiles and especially to analyze them across many cancers. Here we present AMARETTO, an algorithm that addresses both challenges in three steps. First, AMARETTO...
متن کامل